In previous lectures we have seen strings being used numerous times. Today we are going to go into a bit more detail. First, some terminology:
Okay, lets begin!
A string is basically a bunch of unicode characters, this makes them the ideal data type for storing written text. The Syntax:
{quote-character} unicode characters {MATCHING quote-character}
Examples:
And just as with numbers, we can also convert other data-types to strings using the str function.
The reason Python accepts the use of single OR double quote characters is to make it easier for dealing with text that actually contains quote-characters. Suppose for instance we wanted to store the following sentence as a string:
"Ahhh!!!! spiders!", cried the monster. "Do not worry" said our hero, "I have a sharp spoon".
wow, I'm hooked; with epic character development like that maybe I should be writing novels instead of programming tutorials?
Anyway, I digress. The point is if I try to save this sentence with double-quotes, problems occur. But I can save the string as is if wrap my string with single-quote characters. As demonstrated by the next two code snippets.
In [15]:
# wrapping text with double quotes...
cool_story_bro = ""Ahhh!!!! spiders!", cried the monster. "Do not worry" said our hero, "I have a sharp spoon"."
print(cool_story_bro)
In [7]:
# wrapping text with single quotes...
cool_story_bro = '"Ahhh!!!! spiders!", cried the monster."Do not worry" said our hero, "I have a sharp spoon".'
print(cool_story_bro)
When I first wrote the example it took me about 10 minutes to actually get it working, I just couldn't figure out what the problem was!
It turns out that in the original draft my spider-maiming hero said the phrase:
“don’t worry”
The ' character in don't was messing up my attempt to enclose the whole string within single quotes. Here, let me show you:
In [8]:
cool_story_bro = '"Ahhh!!!! spiders!", cried the monster."Don't worry" said our hero, "I have a sharp spoon".'
print(cool_story_bro)
So what was my genius solution? Well obviously we cheat and change the text!
“don’t worry” --> “do not worry”
Problem...err….solved?
Anyway, Python does have ways of handling such inputs, your homework for this week to figure out how to make my intended string work – if it takes you less than 10 minutes then congratulations, you figured it out faster than I did. :)
Just like the int() and float() functions, the str function is a good way to convert one data-type to another. If I have the an integer and I want to store it as a string I can simply call the str() function, and Python will do the rest. The code snippet below will take any float/integer and return a string representation of that number.
In [9]:
def num_to_string(number):
"""takes a number of type float/int, returns string of that number"""
return str(number)
# For an explanation of the next three lines of code, please see the 'calling functions' lecture.
a = num_to_string(4555549099511) # large integer
b = num_to_string(-0.0044352334) # negative float
c = num_to_string(4.3e10) # scientific notation
print(a, type(a))
print(b, type(b))
print(c, type(c))
# and notice that we can use the float/int methods to convert the strings back to numbers just as easily...
print( float(c), type(float(c)) )
One reason you might want to store a number as a string is because if you convert a number to a string you have access to more 'methods' which may make some processes easier.
for example, lets suppose I want to find out what the first two digits of the number are. Converting a number to a string makes this process easy since strings are iterable and can be indexed into, whereas numbers cannot. Thats a lot of techinical jargon right now, but don't worry we shall cover indexing later.
In [4]:
def first_two_digits(number):
n = str(number) # < -- convert number to string
n = n[:2] # < -- get the first two characters via slicing (more on slicing later).
n = int(n) # < -- converting n back to a number.
return n
print(first_two_digits(100000))
print(first_two_digits(933323))
print(first_two_digits(11))
Text frequently has ‘meta-data’ attached to it, by meta-data in this context I’m mainly talking about things like HTML tags; font colour, size, stylings (e.g bold, italic), and so on.
The normal process for handling this is to have the code embedded into the text itself. In other words, the text itself has characters that Python has parse as commands.
But for some applications you might want to have the ability to literally print every character passed in. So example, in the example directly below we have two lines of text, a pink heading and some text with tags. Crucially these two pieces of text are the same; the difference in what we see is the difference between literally printing the HTML tags versus executing them.
This is a heading
<h1 style="color:pink;">This is a heading</h1>
So, how does the computer know to interpret text in one way and not the other? Well, the solution is something called “escape characters”.
Just for completeness, to show you the tags to get pink text I had to use several HTML escape characters, I typed the following monstrocity:
<h1 style="color:pink;">This is a heading</h1>
Thats a complex line of jargon I couldn't have done without the help of this tool. So yeah, escaping in HTML can be bit tricky but fortunately for us escape characters in Python are a bit easier to work with.
Consider the following lines of code.
In [5]:
a = "\\"
b = "\"
At first glance this code seems perfectly fine, right? The variable 'A' should be the string \\ right? And variable 'B' should just be a single backslash. But we don't get that, Python throws and error!
What’s going on here? Well, the reason is that the backslash character (\) is an escape character in Python. To actually get Python to literally print "\\" or "\" we would actually have to type out:
In [6]:
a = "\\\\" # double \\
b = "\\" # single \
print(a, b)
# Note that I didn't have to do any escaping in the comments, thats because Python just ignores comments!
It is important to be aware of these Python features because If you don't know this stuff it you can be easily 'caught-out' the moment you start trying to parse complex strings. In what follows I have a (hopefully humorous) example of why you should care about this stuff. Let’s talk pathing.
In [18]:
directory = "C:\Documents\pictures\selfies"
print(directory)
So let's imagine we are building some sort of code that saves a directory as a string for use later on. If we print this particular directory we get no surprises, it just works as we would expect.
But hold-up, what if I wanted to send my girlfriend a naughty photo! inside of my 'selfies' folder I have a 'nudes' folder. And inside the 'nudes' folder I have a plethora of Jpegs; my little sausage pictured from a variety of different angles wearing an assortment of novelty hats.
“Wait, did he just say little?”
On this occasion however, let's pretend I'm not a total weirdo (debatable), I want to sent her something arty, something classy.
[scurries through folder...]
[finds ... 'tasteful.jpeg' ]
Alright, lets code that up and see what happens...
In [19]:
directory2 = "C:\Documents\pictures\selfies\nudes\tasteful.jpeg"
print(directory2)
Oh dear! It seems like python doesn't want me to send dick-pics over the internet afterall! thats a pity, a big pity (wink wink).
What has gone wrong? Well, basically every time Python see's a backslash character it looks to see what the next character is. In the case of directory above, we have the following: \D, \p, \s, \n, \t
This first time we ran the code we didn't get any errors because \D \p where not special 'commands'. However, both \n and \t are special commands in Python. These commands get executed and we get a different result.
As an aside, \n is a very useful command to use within strings. It starts a new line, and splitting data up into separate lines frequently comes in useful.
"{some text}\n{more text}"
Simple example:
In [1]:
greeting = "hello\nworld"
print(greeting)
# using \t (which is tab)
greeting = "hello\tworld"
print(greeting)
# There are other commands of course, but I feel that most of them are not useful enough to be worth teaching.
In short, \n is a newline, and \t is tab. Thus, if we are trying to save/open files/folders on windows systems that start with t or n we can end up having some difficulties.
There are a few solutions to this problem. If you are dealing with directories specificially then the best choice is to you the os module. This module will fix a number of these issues for you (the os module works on linux and windows machines).
for example:
In [4]:
import os
directory = "C:\Documents\pictures\selfies"
photo_name = "santa_hat2.jpeg"
## the bad way
path_to_photo_1 = directory + "\\" + photo_name
## the good way
path_to_photo_2 = os.path.join(directory, photo_name)
print(path_to_photo_1)
print(path_to_photo_2)
However, the above method only works for file systems, how can we solve this problem in a more general way?
In [5]:
string1 = r"\nevery\nword\nis\non\na\nnew\nline" # notice the 'r' BEFORE the double-quote mark?
string2 = "\nevery\nword\nis\non\na\nnew\nline" # without the 'r', for comparision.
print("The raw string version looks like this:\n", string1)
print("\n")
print("The normal version of string looks like this:\n", string2)
In [12]:
# Repeating strings
# {string} * {integer}
# Examples:
print("a" * 10)
print("abc" * 3)
In [13]:
# Concatenation
# {string} + {string}
# Examples:
print("ab" + "c")
print("a" + "b" + "c")
In [10]:
# Membership
# {string} in {string}
# Examples:
print("a" in "ab")
print("a" in "cb")
print("abc" in "aabbcc") # must be an exact match.
In [ ]:
# Your answer here…